DATA 202 - Week 14

Multivariate regression

Nathan Alexander, PhD

Center for Applied Data Science and Analytics

Part I: Context

Now that we have many foundational elements identified and practiced - such as generating code to explore data, cleaning data for analysis, and some elements of theory construction - we can begin focusing on some of the important technical components of model building and analysis: interpretation.

  • Interpretation relies very heavily on both your research question and the subsequent empirical study.

  • While your research question may be based on a host of factors, your empirical study relies on a combination of:

    • Theoretical frameworks

    • Analytic method

    • Interpretations

A suggestive and indicative mode of the triangulation method from Tzagkarakis & Kritas (2023).

Research questions

The below research questions highlight the intersection of social justice issues in multiple variable quantitative analysis. Keep in mind that these questions can be further refined and tailored to specific contexts or issues of interest within the realm of social justice.

  1. How does income inequality and geographical location affect access to quality education?

  2. What disparities in the criminal justice system by race and gender?

  3. How does gender discrimination and age impact career advancement in the workplace?

  4. What are the effects of housing policies and income on residential segregation and access to affordable housing?

  5. How does healthcare accessibility and affordability vary across different socioeconomic groups?

Sample analysis

Let us continue with a sample analysis.

We will assume that state data collected for a sample of 100 randomly selected cities requesting funding after the approval of a new bill on affordable housing. The data set includes three key variables.

Research question

What is the relationship between state funding for affordable housing initiatives and the availability of new affordable housing units?

Details about each variable are provided below:

  • city is a marker (which matches the data index) used to indicate a randomly selected city.

  • funding is the total amount of funding provided to families (in thousands of dollars) in a given 3-week period

  • housing_availability is the average of city housing units allocated over the same funding period

  • advocacy is the average number of calls to the state representatives’ hotline four months prior

The advocacy variable was generated as a result of a similar study conducted in a neighboring state, which noticed that there was a potential lag-relationship between advocacy and funding allocations approved at the state-level.

head(data)
  city funding housing_availability advocacy
1    1  251.32                34.60    21.15
2    2  422.71                50.32    35.27
3    3  418.33                45.47    28.54
4    4  423.08                46.43    27.23
5    5  503.13                55.97    36.67
6    6  428.78                60.77    27.49
tail(data)
    city funding housing_availability advocacy
95    95  248.48                55.39    18.27
96    96  385.40                58.23    33.48
97    97  314.17                67.69    26.19
98    98  222.00                52.38    22.95
99    99  317.35                57.37    18.10
100  100  463.09                59.65    27.02
summary(data)
      city           funding      housing_availability    advocacy    
 Min.   :  1.00   Min.   :216.2   Min.   :34.60        Min.   :16.19  
 1st Qu.: 25.75   1st Qu.:280.4   1st Qu.:48.54        1st Qu.:22.52  
 Median : 50.50   Median :343.4   Median :54.97        Median :27.04  
 Mean   : 50.50   Mean   :360.4   Mean   :54.54        Mean   :26.63  
 3rd Qu.: 75.25   3rd Qu.:437.7   3rd Qu.:59.55        3rd Qu.:30.30  
 Max.   :100.00   Max.   :547.4   Max.   :77.86        Max.   :37.34  

Exploration

We can use some base-R commands to get a quick summary of each variable.

# get plots of variables
hist(funding)

hist(housing_availability)

Exploration

# get summary statistics for variables
summary(funding)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  216.2   280.4   343.4   360.4   437.7   547.4 
summary(housing_availability)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  34.60   48.54   54.97   54.54   59.55   77.86 

We can also produce quick plots to examine the relationship between each variable.

Here, we include code to get the correlation coefficient.

# perform correlation analysis
plot(funding, housing_availability)

cor(funding, housing_availability)
[1] 0.266359
plot(advocacy, funding)

cor(advocacy, funding)
[1] 0.4757307
plot(advocacy, housing_availability)

cor(advocacy, housing_availability)
[1] 0.1444811

Interpretation

First, researchers decided to run a linear regression model on housing_availability and funding.

# perform linear regression analysis
model1 <- lm(housing_availability ~ funding)

# summary of the regression model
summary(model1)

Call:
lm(formula = housing_availability ~ funding)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.1793  -5.9060  -0.6551   5.0543  22.4049 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 45.49080    3.41826  13.308  < 2e-16 ***
funding      0.02511    0.00918   2.736  0.00739 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.579 on 98 degrees of freedom
Multiple R-squared:  0.07095,   Adjusted R-squared:  0.06147 
F-statistic: 7.484 on 1 and 98 DF,  p-value: 0.007391
Plot the data and regression line
ggplot(data, aes(x = funding, y = housing_availability)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(x = "City Funding", y = "Housing Availability", title = "Relationship between City Funding and Housing Availability")

One researcher, however, suggested that a more robust regression analysis should be used with OLS techniques. Robust regression analysis, as you may recall, helps us reduce outlier effects.

Note: we need to load the MASS package and library to run the following code.

ols <- lm(housing_availability ~ funding)
summary(ols)

Call:
lm(formula = housing_availability ~ funding)

Residuals:
     Min       1Q   Median       3Q      Max 
-18.1793  -5.9060  -0.6551   5.0543  22.4049 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 45.49080    3.41826  13.308  < 2e-16 ***
funding      0.02511    0.00918   2.736  0.00739 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.579 on 98 degrees of freedom
Multiple R-squared:  0.07095,   Adjusted R-squared:  0.06147 
F-statistic: 7.484 on 1 and 98 DF,  p-value: 0.007391
opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0))
plot(ols, las = 1)
par(opar) # we use the par() function to restore graphical parameters to their original values

From this analysis, we see that a few observations are possibly problematic to our model.

We can explore some of these observations in more detail.

data[c(12, 50, 73), 1:4]
   city funding housing_availability advocacy
12   12  396.66                77.86    30.15
50   50  470.96                75.79    22.38
73   73  217.93                68.29    27.94

The three cities noted (and there may be others) have large residuals.

We can examine these in more detail.

distance <- cooks.distance(ols) # we get a measure of the Cook's distance values.
res <- stdres(ols)
a <- cbind(data, distance, res)
a[distance > 4/100, ]
   city funding housing_availability advocacy   distance       res
1     1  251.32                34.60    21.15 0.04984715 -2.029432
7     7  216.20                64.53    17.56 0.04561739  1.614402
12   12  396.66                77.86    30.15 0.04014294  2.626744
16   16  495.17                73.25    25.20 0.05225347  1.813880
50   50  470.96                75.79    22.38 0.05837888  2.179622
73   73  217.93                68.29    27.94 0.07252929  2.053558
75   75  243.32                67.02    20.18 0.04371737  1.820393
78   78  236.61                36.76    16.89 0.04260811 -1.734093

The decisions were made based on the following notes:

  • Cook’s distance cooks.distance() provides a measure of the influence of a data point when performing regression.

  • stdres standardized the residuals from our model

  • cbind() attaches the two measures to our data frame

We can use a cutoff point \(4/n\) where \(n\) is the sample size recommend by others to select the values to display.

We then get the absolute value of the residuals (remember that the sign does not matter in distance), and we print the observations with the highest residuals (here we focus on the top 10 values).

absres <- abs(res)
data1 <- cbind(data, distance, res, absres)
assorted <- data1[order(-absres), ]
assorted[1:10,]
   city funding housing_availability advocacy   distance       res   absres
12   12  396.66                77.86    30.15 0.04014294  2.626744 2.626744
50   50  470.96                75.79    22.38 0.05837888  2.179622 2.179622
88   88  317.76                35.29    20.73 0.02780093 -2.131960 2.131960
25   25  286.74                70.57    22.02 0.03637332  2.100555 2.100555
73   73  217.93                68.29    27.94 0.07252929  2.053558 2.053558
1     1  251.32                34.60    21.15 0.04984715 -2.029432 2.029432
85   85  279.11                36.86    19.93 0.03027410 -1.839822 1.839822
75   75  243.32                67.02    20.18 0.04371737  1.820393 1.820393
16   16  495.17                73.25    25.20 0.05225347  1.813880 1.813880
78   78  236.61                36.76    16.89 0.04260811 -1.734093 1.734093

We now run our robust regression analysis.

We do this by using the rlm() function in the MASS package.

There are several weights that can be used for the iterated re-weighted least squares technique (IRLS)1.

rrmodel <- rlm(housing_availability ~ funding, data = data)
summary(rrmodel)

Call: rlm(formula = housing_availability ~ funding, data = data)
Residuals:
     Min       1Q   Median       3Q      Max 
-17.9198  -5.4416  -0.3424   5.2609  22.8048 

Coefficients:
            Value   Std. Error t value
(Intercept) 45.7779  3.5798    12.7880
funding      0.0234  0.0096     2.4328

Residual standard error: 8.213 on 98 degrees of freedom

The default weight is the Huber weight.

Huber weights are a type of weight function used to downweight or mitigate the influence of outliers on the estimation procedure.

In traditional least squares regression, all data points are given equal weight, and the estimation procedure is sensitive to the presence of outliers. The use of weights in our robust regression model aims to provide more robust estimates by assigning different weights to the observations, giving less influence to outliers.

hweights <- data.frame(city = data$city, resid = rrmodel$resid, weight = rrmodel$w)
hweights2 <- hweights[order(rrmodel$w),]
hweights2[1:15,]
   city     resid    weight
12   12  22.80484 0.4843946
50   50  18.99708 0.5814916
25   25  18.08570 0.6107904
88   88 -17.91981 0.6164041
73   73  17.41506 0.6343102
1     1 -17.05588 0.6476276
16   16  15.89084 0.6951638
75   75  15.55123 0.7103366
85   85 -15.44584 0.7151312
43   43  14.97271 0.7377868
97   97  14.56416 0.7584838
78   78 -14.55183 0.7590662
9     9  14.36036 0.7692537
7     7  13.69553 0.8065877
33   33 -12.74093 0.8669457

Huber weights assign larger weights to observations that are close to the regression line and smaller weights to observations that deviate significantly from the line. The weight assigned to each observation depends on its residuals (the difference between the observed values and the predicted values).

Causality

Despite our work on the initial model, the issue of causality needs to be discussed.

There are a few considerations that need to be taken into account:

  • Confounding variables: There may be other factors that influence the model apart from city funding. For example, economic conditions, housing availability, and social policies can also play significant roles. Failing to account for these confounding variables may lead to erroneous conclusions about the causal relationship.

  • Reverse causality: The relationships can be bidirectional. Higher housing availability rates may lead to increased city funding directed at addressing the issue. Thus, it’s possible that the relationship is driven by reverse causality, where higher levels of housing availability cause increased funding rather than the other way around.

  • Omitted variable bias: There may be unobserved or unmeasured factors that affect both city funding and housing availability. Failing to include these variables in the analysis can lead to omitted variable bias, potentially distorting the estimated relationships.

  • Ecological fallacy: Analyzing aggregated data across the state- and city- levels may not capture the correct level of nuances within the relationship. Aggregating data can lead to an ecological fallacy, where conclusions made at the aggregate level may not hold true at different levels.

Multicollinearity

Multicollinearity refers to a high correlation or linear relationship between two or more predictor variables in a regression model. In the case of three variables, multicollinearity occurs when there is a strong linear relationship between any pair of the three variables, making it difficult to separate their individual effects on the response variable. This can cause instability in the regression model, inflated standard errors, and difficulties in interpreting the coefficients.

Assume we updated our theoretical statement and research question and add the advocacy variable to our model.

# perform linear regression analysis
model2 <- lm(housing_availability ~ funding + advocacy)

# summary of the regression model
summary(model2)

Call:
lm(formula = housing_availability ~ funding + advocacy)

Residuals:
     Min       1Q   Median       3Q      Max 
-17.9890  -6.1250  -0.6158   4.9763  22.3024 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 44.80516    4.77827   9.377 2.97e-15 ***
funding      0.02408    0.01049   2.296   0.0238 *  
advocacy     0.03969    0.19229   0.206   0.8369    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.621 on 97 degrees of freedom
Multiple R-squared:  0.07136,   Adjusted R-squared:  0.05221 
F-statistic: 3.727 on 2 and 97 DF,  p-value: 0.02759

Interaction effects

Next, we add an interaction term to our model.

# get a summary of the advocacy data
summary(advocacy)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  16.19   22.52   27.04   26.63   30.30   37.34 
# examine the relationship between funding and advocacy
cor(advocacy, funding)
[1] 0.4757307
# perform linear regression analysis
model3 <- lm(housing_availability ~ funding + advocacy + funding*advocacy)

# summary of the regression model
summary(model3)

Call:
lm(formula = housing_availability ~ funding + advocacy + funding * 
    advocacy)

Residuals:
     Min       1Q   Median       3Q      Max 
-17.9963  -6.2218  -0.5457   4.8889  22.3465 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)   
(Intercept)      49.0944885 17.5511591   2.797  0.00623 **
funding           0.0117777  0.0495659   0.238  0.81268   
advocacy         -0.1236422  0.6712607  -0.184  0.85425   
funding:advocacy  0.0004576  0.0018009   0.254  0.79997   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.663 on 96 degrees of freedom
Multiple R-squared:  0.07198,   Adjusted R-squared:  0.04298 
F-statistic: 2.482 on 3 and 96 DF,  p-value: 0.06555

Please note that we may need to run additional tests or more robust models to inform interpretation.

Statistical vs. practical significance

When analyzing the relationship between state funding and housing availability, it is important to consider both statistical significance and practical significance.

Statistical significance refers to the likelihood that the observed relationship or difference between variables is not due to chance. It is determined through statistical tests, such as hypothesis testing or p-values. In this context, statistical significance would indicate whether there is evidence to suggest that state funding has a statistically significant effect on housing availability. A statistically significant result suggests that the relationship between the variables is unlikely to have occurred by random chance.

Practical significance focuses on the magnitude or practical importance of the observed relationship. It asks whether the observed effect size is meaningful or substantial in real-world terms. In the case of state funding and housing availability, practical significance would involve evaluating whether the observed impact of state funding on housing availability is large enough to have a meaningful or substantial effect on the availability of housing units.

Note, however, that while statistical significance provides evidence of a relationship, it does not necessarily imply practical importance. A statistically significant relationship may exist but have a negligible or trivial effect in practice. Conversely, a relationship may have practical significance, even if it does not reach statistical significance due to limited sample size or other factors.

Replication studies

Exploring varied statistical outputs and their significance in a social justice context requires care, both in terms of the underlying theories that relate to the variables themselves and their use across different context. An additional factor that we have discussed relates to the role of the theoretical constructions and their applicability to issues of social injustice.

More often than not, caution should take the lead when developing new models. In these instances, some variation on what is known as a replication study can become a valuable tool. A replication study is a type of study that aims to reproduce or replicate the findings of a previous study. In the context of our course, the replication frameworks can be applied to examine the relationships between variables across contexts and different populations.

There are different types of replication studies.

  • Direct replication: In this replication study type, researchers attempt to reproduce the original study as closely as possible, meaning they follow the same research design, methodologies, and data analysis procedures.

  • Partial replication: In this replication study type, researchers attempt to replicate only a portion of the original study. Often, researchers doing a partial replication study focus on a specific aspect, variable, or component of the study.

  • Conceptual replication: In this replication study type, researchers conduct a replication analysis that focuses on the same research question(s) but through the use of different methods, measures, or population groups.

While replication studies are often used to help ensure the credibility and seeming generalizations found in statistical research findings, they can also serve as a part of a broader process to examine the role of context in statistical models. Importantly, failure to replicate the findings of a study do not mean that the original study findings were incorrect or flawed. Together, these types of explorations can contribute to scientific knowledge and provide evidence to help us understand the role of theory and the practice of social justice.

Beyond regression

Researchers have access to a wide range of advanced statistical techniques and methodologies that provide deeper insights into complex relationships and patterns within data. These approaches go beyond the linear relationships examined in regression analysis and allow researchers to explore non-linear, interactive, and dynamic effects among variables. By utilizing these advanced techniques, researchers can uncover hidden patterns, make more accurate predictions, account for complex interactions, and gain a more comprehensive understanding of the phenomena under investigation.

Some of these methods often provide greater flexibility in handling missing data, dealing with outliers, and accommodating various types of data structures. Overall, the utilization of these advanced statistical techniques expands the availability of tools to consider ways to delve deeper into the complexities of their data and extract meaningful insights.

Part II: Content

Multiple Variable Analysis and Multivariate Analysis are two terms often used in statistics and research methodology to describe different approaches to analyzing data involving multiple variables. While they share similarities, there are distinct differences between these two concepts.

Multivariable vs. Multivariate

Multiple variable analysis investigates the influence of individual independent variables on a single dependent variable, while multivariate analysis explores the relationships and patterns among multiple variables simultaneously.

Multiple Variable Analysis is often used when studying the effects of specific factors, while multivariate analysis is employed to uncover broader patterns and structures within a dataset. Both approaches are valuable in data analysis, and the choice between them depends on the research objectives and the nature of the data being analyzed.

Definitions: Multiple variable analysis vs. Multivariate analysis

Multiple Variable Analysis: Multiple Variable Analysis refers to the process of examining the relationships between several independent variables and a single dependent variable. It aims to understand how each independent variable influences or predicts the dependent variable individually, while controlling for other variables. In this analysis, each independent variable is analyzed separately, often using techniques such as regression analysis or analysis of variance (ANOVA).

Multivariate Analysis: Multivariate Analysis involves the simultaneous analysis of multiple dependent and independent variables. It aims to explore the relationships and patterns among multiple variables, considering them as a whole. This analysis technique allows for the examination of complex interactions and associations between variables, providing a more comprehensive understanding of the data.

Key characteristics of multiple variable analysis

  1. Focus: Examining the impact of individual independent variables on a single dependent variable.

  2. Analytic approach: Each independent variable is analyzed separately, allowing for isolation of their effects.

  3. Purpose: To identify the individual contributions and significance of multiple variables in explaining the variation in the dependent variable.

  4. Statistical techniques: Common techniques include simple linear regression, multiple linear regression, and ANOVA.

Key characteristics of multivariate analysis

  1. Focus: Examining the relationships and interactions among multiple variables simultaneously.

  2. Analytic approach: Considering all variables together, accounting for their joint effects and potential interdependence.

  3. Purpose: To explore patterns, associations, and structures within the data, identifying underlying factors or dimensions.

  4. Statistical techniques: Common techniques include factor analysis, principal component analysis, cluster analysis, and structural equation modeling.

Examples of multivariate analysis techniques

  • Principal component analysis (PCA): PCA is used to reduce the dimensionality of data by transforming it into a new set of uncorrelated variables called principal components. R functions for PCA include prcomp() and princomp().

  • Factor analysis: Factor Analysis aims to identify latent factors that explain the correlations among observed variables. R offers functions like factanal() and psych::fa() for conducting factor analysis.

  • Canonical correlation analysis (CCA): CCA examines the relationships between two sets of variables and identifies the linear combinations of each set that have maximum correlation with each other. The CCA() function in the stats package can be used for this analysis.

  • Cluster analysis: Cluster Analysis groups similar observations into clusters based on the similarity of their characteristics. R provides various clustering techniques, such as k-means clustering (kmeans()), hierarchical clustering (hclust()), and model-based clustering (Mclust()).

  • Discriminant analysis: Discriminant Analysis aims to find a linear combination of variables that maximally separate predefined groups or classes. R offers functions like lda() and qda() for performing Linear Discriminant Analysis (LDA) and Quadratic Discriminant Analysis (QDA), respectively.

  • Multivariate regression: Multivariate Regression extends simple linear regression to multiple response variables. The lm() function in R can be used for multivariate regression analysis.

  • Multivariate analysis of variance (MANOVA): MANOVA extends the analysis of variance (ANOVA) to multiple response variables simultaneously. The manova() function in R can be used for MANOVA.

  • Multidimensional scaling (MDS): MDS visualizes the similarity or dissimilarity between objects in a lower-dimensional space. R provides functions like cmdscale() and isoMDS() for performing MDS.

  • Structural Equation Modeling (SEM): SEM is a comprehensive framework for testing complex relationships among variables. R packages like lavaan and sem offer functionalities for conducting SEM.

  • Correspondence Analysis: Correspondence Analysis explores the associations between categorical variables and visualizes them in a low-dimensional space. The ca() function in the ca package is commonly used for correspondence analysis.

We will consider a few of these models in our final weeks for the course.

Part III: Code

This week, we use some standard data included in R to further discuss model interpretation.

While these data sets do not directly connect to the content of our course, they provide some useful examples to return to as they are discussed on many websites that use R and that can be found in online forums.

Each example illustrates different scenarios for interpreting linear models using the summary output. Remember to consider coefficients, standard errors, t-values, and p-values to assess the significance and direction of relationships between predictors and the response variable. Additionally, theory construction and relevant knowledge and context are crucial for a comprehensive interpretation of the results.

This data is from the 1974 Motor Trend US magazine. The data set comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). You could run similar models using data in the critstats package.

names(mtcars)
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
[11] "carb"
summary(mtcars)
      mpg             cyl             disp             hp       
 Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
 Median :19.20   Median :6.000   Median :196.3   Median :123.0  
 Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
 Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
      drat             wt             qsec             vs        
 Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
 Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
 Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
 Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
       am              gear            carb      
 Min.   :0.0000   Min.   :3.000   Min.   :1.000  
 1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
 Median :0.0000   Median :4.000   Median :2.000  
 Mean   :0.4062   Mean   :3.688   Mean   :2.812  
 3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
 Max.   :1.0000   Max.   :5.000   Max.   :8.000  

Example 1: Simple Linear Regression

# Fit a simple linear regression model
model <- lm(mpg ~ hp, data = mtcars)

# Print the model summary
summary(model)

Call:
lm(formula = mpg ~ hp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.7121 -2.1122 -0.8854  1.5819  8.2360 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
hp          -0.06823    0.01012  -6.742 1.79e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.863 on 30 degrees of freedom
Multiple R-squared:  0.6024,    Adjusted R-squared:  0.5892 
F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

The summary output provides information about the coefficients, standard errors, t-values, and p-values. In this case, the intercept represents the estimated baseline miles per gallon (mpg) when horsepower is zero. The coefficient for horsepower indicates the estimated change in mpg for each unit increase in horsepower.

Example 2: Multiple Linear Regression

# Fit a multiple linear regression model
model <- lm(mpg ~ hp + wt, data = mtcars)

# Print the model summary
summary(model)

Call:
lm(formula = mpg ~ hp + wt, data = mtcars)

Residuals:
   Min     1Q Median     3Q    Max 
-3.941 -1.600 -0.182  1.050  5.854 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
hp          -0.03177    0.00903  -3.519  0.00145 ** 
wt          -3.87783    0.63273  -6.129 1.12e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.593 on 29 degrees of freedom
Multiple R-squared:  0.8268,    Adjusted R-squared:  0.8148 
F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

The summary output provides interpretation for each coefficient. For example, the coefficient for horsepower represents the estimated change in mpg for each unit increase in horsepower, holding weight constant. Similarly, the coefficient for weight represents the estimated change in mpg for each unit increase in weight, holding horsepower constant.

Example 3: Categorical Predictor

# Fit a linear regression model with a categorical predictor
model <- lm(mpg ~ factor(cyl), data = mtcars)

# Print the model summary
summary(model)

Call:
lm(formula = mpg ~ factor(cyl), data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.2636 -1.8357  0.0286  1.3893  7.2364 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   26.6636     0.9718  27.437  < 2e-16 ***
factor(cyl)6  -6.9208     1.5583  -4.441 0.000119 ***
factor(cyl)8 -11.5636     1.2986  -8.905 8.57e-10 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.223 on 29 degrees of freedom
Multiple R-squared:  0.7325,    Adjusted R-squared:  0.714 
F-statistic:  39.7 on 2 and 29 DF,  p-value: 4.979e-09

When a categorical predictor, such as “cyl” (number of cylinders), is included in the model, R automatically treats it as a set of dummy variables. The summary output provides the coefficients for each category level (e.g., 4 cylinders, 6 cylinders, 8 cylinders). These coefficients represent the estimated difference in the response variable (mpg) compared to the reference category (usually the intercept).

Example 4: Interaction Effect

# Fit a linear regression model with an interaction term
model <- lm(mpg ~ hp * wt, data = mtcars)

# Print the model summary
summary(model)

Call:
lm(formula = mpg ~ hp * wt, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.0632 -1.6491 -0.7362  1.4211  4.5513 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 49.80842    3.60516  13.816 5.01e-14 ***
hp          -0.12010    0.02470  -4.863 4.04e-05 ***
wt          -8.21662    1.26971  -6.471 5.20e-07 ***
hp:wt        0.02785    0.00742   3.753 0.000811 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.153 on 28 degrees of freedom
Multiple R-squared:  0.8848,    Adjusted R-squared:  0.8724 
F-statistic: 71.66 on 3 and 28 DF,  p-value: 2.981e-13

When an interaction term (e.g., horsepower * weight) is included in the model, the summary output provides coefficients for both main effects (horsepower and weight) as well as the interaction term. The interaction coefficient represents the change in the relationship between mpg and horsepower as weight increases.

# do some exploratory analysis on the survey data in the MASS package

library(dplyr) 
survey 
       Sex Wr.Hnd NW.Hnd W.Hnd    Fold Pulse    Clap Exer Smoke Height      M.I
1   Female   18.5   18.0 Right  R on L    92    Left Some Never 173.00   Metric
2     Male   19.5   20.5  Left  R on L   104    Left None Regul 177.80 Imperial
3     Male   18.0   13.3 Right  L on R    87 Neither None Occas     NA     <NA>
4     Male   18.8   18.9 Right  R on L    NA Neither None Never 160.00   Metric
5     Male   20.0   20.0 Right Neither    35   Right Some Never 165.00   Metric
6   Female   18.0   17.7 Right  L on R    64   Right Some Never 172.72 Imperial
7     Male   17.7   17.7 Right  L on R    83   Right Freq Never 182.88 Imperial
8   Female   17.0   17.3 Right  R on L    74   Right Freq Never 157.00   Metric
9     Male   20.0   19.5 Right  R on L    72   Right Some Never 175.00   Metric
10    Male   18.5   18.5 Right  R on L    90   Right Some Never 167.00   Metric
11  Female   17.0   17.2 Right  L on R    80   Right Freq Never 156.20 Imperial
12    Male   21.0   21.0 Right  R on L    68    Left Freq Never     NA     <NA>
13  Female   16.0   16.0 Right  L on R    NA   Right Some Never 155.00   Metric
14  Female   19.5   20.2 Right  L on R    66 Neither Some Never 155.00   Metric
15    Male   16.0   15.5 Right  R on L    60   Right Some Never     NA     <NA>
16  Female   17.5   17.0 Right  R on L    NA   Right Freq Never 156.00   Metric
17  Female   18.0   18.0 Right  L on R    89 Neither Freq Never 157.00   Metric
18    Male   19.4   19.2  Left  R on L    74   Right Some Never 182.88 Imperial
19    Male   20.5   20.5 Right  L on R    NA    Left Some Never 190.50 Imperial
20    Male   21.0   20.9 Right  R on L    78   Right Freq Never 177.00   Metric
21    Male   21.5   22.0 Right  R on L    72    Left Freq Never 190.50 Imperial
22    Male   20.1   20.7 Right  L on R    72   Right Freq Never 180.34 Imperial
23    Male   18.5   18.0 Right  L on R    64   Right Freq Never 180.34 Imperial
24    Male   21.5   21.2 Right  R on L    62   Right Some Never 184.00   Metric
25  Female   17.0   17.5 Right  R on L    64    Left Some Never     NA     <NA>
26    Male   18.5   18.5 Right Neither    90 Neither Some Never     NA     <NA>
27    Male   21.0   20.7 Right  R on L    90   Right Some Never 172.72 Imperial
28    Male   20.8   21.4 Right  R on L    62 Neither Freq Never 175.26 Imperial
29    Male   17.8   17.8 Right  L on R    76 Neither Freq Never     NA     <NA>
30    Male   19.5   19.5 Right  L on R    79   Right Some Never 167.00   Metric
31  Female   18.5   18.0 Right  R on L    76   Right None Occas     NA     <NA>
32    Male   18.8   18.2 Right  L on R    78   Right Freq Never 180.00   Metric
33  Female   17.1   17.5 Right  R on L    72   Right Freq Heavy 166.40 Imperial
34    Male   20.1   20.0 Right  R on L    70   Right Some Never 180.00   Metric
35    Male   18.0   19.0 Right  L on R    54 Neither Some Regul     NA     <NA>
36    Male   22.2   21.0 Right  L on R    66   Right Freq Occas 190.00   Metric
37  Female   16.0   16.5 Right  L on R    NA   Right Some Never 168.00   Metric
38    Male   19.4   18.5 Right  R on L    72 Neither Freq Never 182.50   Metric
39    Male   22.0   22.0 Right  R on L    80   Right Some Never 185.00   Metric
40    Male   19.0   19.0 Right  R on L    NA Neither Freq Occas 171.00   Metric
41  Female   17.5   16.0 Right  L on R    NA   Right Some Never 169.00   Metric
42  Female   17.8   18.0 Right  R on L    72   Right Some Never 154.94 Imperial
43    Male     NA     NA Right  R on L    60    <NA> Some Never 172.00   Metric
44  Female   20.1   20.2 Right  L on R    80   Right Some Never 176.50 Imperial
45  Female   13.0   13.0  <NA>  L on R    70    Left Freq Never 180.34 Imperial
46    Male   17.0   17.5 Right  R on L    NA Neither Freq Never 180.34 Imperial
47    Male   23.2   22.7 Right  L on R    84    Left Freq Regul 180.00   Metric
48    Male   22.5   23.0 Right  R on L    96   Right None Never 170.00   Metric
49  Female   18.0   17.6 Right  R on L    60   Right Some Occas 168.00   Metric
50  Female   18.0   17.9 Right  R on L    50    Left None Never 165.00   Metric
51    Male   22.0   21.5  Left  R on L    55    Left Freq Never 200.00   Metric
52    Male   20.5   20.0 Right  L on R    68   Right Freq Never 190.00   Metric
53    Male   17.0   18.0 Right  L on R    78    Left Some Never 170.18 Imperial
54    Male   20.5   19.5 Right  L on R    56   Right Freq Never 179.00   Metric
55    Male   22.5   22.5 Right  R on L    65   Right Freq Regul 182.00   Metric
56    Male   18.5   18.5 Right  L on R    NA Neither Freq Never 171.00   Metric
57  Female   15.5   15.4 Right  R on L    70 Neither None Never 157.48 Imperial
58    Male   19.5   19.7 Right  R on L    72   Right Freq Never     NA     <NA>
59    Male   19.5   19.0 Right  L on R    62   Right Freq Never 177.80 Imperial
60    Male   20.6   21.0  Left  L on R    NA    Left Freq Occas 175.26 Imperial
61    Male   22.8   23.2 Right  R on L    66 Neither Freq Never 187.00   Metric
62  Female   18.5   18.2 Right  R on L    72 Neither Freq Never 167.64 Imperial
63  Female   19.6   19.7 Right  L on R    70   Right Freq Never 178.00   Metric
64  Female   18.7   18.0  Left  L on R    NA    Left None Never 170.00   Metric
65  Female   17.3   18.0 Right  L on R    64 Neither Freq Never 164.00   Metric
66    Male   19.5   19.8 Right Neither    NA   Right Freq Never 183.00   Metric
67  Female   19.0   19.1 Right  L on R    NA Neither Freq Never 172.00   Metric
68  Female   18.5   18.0 Right  R on L    64   Right Freq Never     NA     <NA>
69    Male   19.0   19.0 Right  L on R    NA   Right Some Never 180.00   Metric
70    Male   21.0   19.5 Right  L on R    80    Left None  <NA>     NA     <NA>
71  Female   18.0   17.5 Right  L on R    64    Left Freq Never 170.00   Metric
72    Male   19.4   19.5 Right  R on L    NA   Right Freq Heavy 176.00   Metric
73  Female   17.0   16.6 Right  R on L    68   Right Some Never 171.00   Metric
74  Female   16.5   17.0 Right  L on R    40    Left Freq Never 167.64 Imperial
75  Female   15.6   15.8 Right  R on L    88    Left Some Never 165.00   Metric
76  Female   17.5   17.5 Right Neither    68   Right Freq Heavy 170.00   Metric
77  Female   17.0   17.6 Right  L on R    76   Right Some Never 165.00   Metric
78  Female   18.6   18.0 Right  L on R    NA Neither Freq Heavy 165.10 Imperial
79  Female   18.3   18.5 Right  R on L    68 Neither Some Never 165.10 Imperial
80    Male   20.0   20.5 Right  L on R    NA   Right Freq Never 185.42 Imperial
81    Male   19.5   19.5  Left  R on L    66    Left Some Never     NA     <NA>
82    Male   19.2   18.9 Right  R on L    76   Right Freq Never 176.50 Imperial
83  Female   17.5   17.5 Right  R on L    98    Left Freq Never     NA     <NA>
84  Female   17.0   17.4 Right  R on L    NA Neither Some Never     NA     <NA>
85    Male   23.0   23.5 Right  L on R    90   Right Freq Never 167.64 Imperial
86  Female   17.7   17.0 Right  R on L    76   Right Some Never 167.00   Metric
87  Female   18.2   18.0 Right  L on R    70   Right Some Never 162.56 Imperial
88  Female   18.3   18.5 Right  R on L    75    Left Freq Never 170.00   Metric
89    Male   18.0   18.0 Right Neither    60   Right Freq Never 179.00   Metric
90  Female   18.0   17.7  Left  R on L    92    Left Some Never     NA     <NA>
91    Male   20.5   20.0 Right  R on L    75    Left Some Never 183.00   Metric
92  Female   17.5   18.0 Right Neither    NA   Right Some Never     NA     <NA>
93  Female   18.2   17.5 Right  L on R    70   Right Some Never 165.00   Metric
94  Female   18.2   18.5 Right  R on L    NA   Right Some Never 168.00   Metric
95    Male   21.3   20.8 Right  R on L    65   Right Freq Heavy 179.00   Metric
96  Female   19.0   18.8 Right  L on R    NA   Right Some Never     NA     <NA>
97    Male   20.0   19.5 Right  R on L    68 Neither Freq Regul 190.00   Metric
98  Female   17.5   17.5 Right  R on L    60   Right Freq Never 166.50   Metric
99    Male   19.5   19.4 Right Neither    NA   Right Freq Never 165.00   Metric
100 Female   19.4   19.6 Right  R on L    68 Neither Freq Never 175.26 Imperial
101   Male   21.9   22.2 Right  R on L    NA   Right Some Never 187.00   Metric
102   Male   18.9   19.1 Right  L on R    60 Neither None Never 170.00   Metric
103 Female   16.0   16.0 Right Neither    NA   Right Some Never 159.00   Metric
104 Female   17.5   17.3 Right  R on L    72   Right Freq Never 175.00   Metric
105 Female   17.5   17.0 Right  R on L    80    Left Some Heavy 163.00   Metric
106 Female   19.5   18.5 Right  R on L    80   Right Some Never 170.00   Metric
107 Female   16.2   16.4 Right  R on L    NA   Right Freq Occas 172.00   Metric
108 Female   17.0   15.9 Right  R on L    85   Right Freq Never     NA     <NA>
109   Male   17.5   17.5 Right  L on R    64 Neither Freq Never 180.00   Metric
110   Male   19.7   20.1 Right  R on L    67    Left Some Regul 180.34 Imperial
111 Female   18.5   18.5 Right  R on L    76    Left Freq Never 175.00   Metric
112   Male   19.2   19.6 Right  L on R    80   Right None Never 190.50 Imperial
113 Female   17.2   16.7 Right  R on L    75   Right Freq Never 170.18 Imperial
114   Male   20.5   21.0 Right  R on L    60   Right Freq Never 185.00   Metric
115 Female   16.0   15.5 Right  L on R    60    Left Freq Never 162.56 Imperial
116 Female   16.9   16.0 Right  L on R    70   Right None Never 158.00   Metric
117 Female   17.0   16.7 Right  R on L    70   Right Some Never 159.00   Metric
118   Male   23.0   22.0  Left  L on R    83    Left Some Heavy 193.04 Imperial
119 Female   18.5   18.0  Left  L on R   100 Neither Some Never 171.00   Metric
120   Male   21.0   20.4 Right  L on R   100   Right Freq Heavy 184.00   Metric
121   Male   20.0   20.0 Right  R on L    80 Neither Freq Occas     NA     <NA>
122   Male   22.5   22.5 Right  L on R    76   Right Freq Occas 177.00   Metric
123 Female   18.5   18.0 Right  R on L    92   Right Freq Never 172.00   Metric
124   Male   19.8   20.0  Left  L on R    59   Right Freq Never 180.00   Metric
125   Male   18.5   18.1 Right  L on R    66    Left Freq Never 175.26 Imperial
126   Male   19.3   19.4 Right  R on L    NA   Right Freq Never 180.34 Imperial
127 Female   16.0   16.0 Right  R on L    68   Right Freq Never 172.72 Imperial
128   Male   18.8   19.1 Right  L on R    66 Neither Freq Regul 178.50   Metric
129 Female   17.5   17.0 Right  R on L    74   Right Freq Never 157.00   Metric
130 Female   16.4   16.5 Right  L on R    90   Right Some Never 152.00   Metric
131   Male   22.0   21.5 Right  R on L    86   Right Freq Never 187.96 Imperial
132   Male   19.0   19.5 Right  L on R    60   Right Some Never 178.00   Metric
133 Female   18.9   20.0 Right  R on L    86   Right Some Never     NA     <NA>
134 Female   15.4   16.4  Left  L on R    80    Left Freq Occas 160.02 Imperial
135   Male   17.9   17.8 Right  R on L    85    Left Some Never 175.26 Imperial
136   Male   23.1   22.5 Right  L on R    90   Right Some Regul 189.00   Metric
137   <NA>   19.8   19.0  Left  L on R    73 Neither Freq Never 172.00   Metric
138   Male   22.0   22.0 Right  L on R    72   Right Freq Never 182.88 Imperial
139   Male   20.0   19.5 Right  L on R    NA   Right Freq Never 170.00   Metric
140 Female   19.5   18.5 Right  L on R    68   Right None Never 167.00   Metric
141 Female   18.0   18.6 Right  R on L    84   Right Some Never 175.00   Metric
142 Female   18.3   19.0 Right  R on L    NA   Right None Never 165.00   Metric
143 Female   19.0   18.8 Right  R on L    65   Right Freq Never 172.72 Imperial
144   Male   21.4   21.0 Right  L on R    96 Neither Some Never 180.00   Metric
145 Female   20.0   19.5  Left  R on L    68 Neither Freq Never 172.00   Metric
146   Male   18.5   18.5 Right  R on L    75 Neither Some Never 185.00   Metric
147   Male   22.5   22.6 Right  L on R    64   Right Freq Regul 187.96 Imperial
148   Male   19.5   20.2 Right  R on L    60 Neither Freq Never 185.42 Imperial
149 Female   18.0   18.0 Right  L on R    92 Neither Freq Never 165.00   Metric
150 Female   18.0   18.5 Right  R on L    64 Neither Freq Never 164.00   Metric
151   Male   21.8   22.3 Right  R on L    76    Left Freq Never 195.00   Metric
152 Female   13.0   12.5 Right  L on R    80   Right Freq Never 165.00   Metric
153 Female   16.3   16.2 Right  L on R    92   Right Some Regul 152.40 Imperial
154   Male   21.5   21.6 Right  R on L    69   Right Freq Never 172.72 Imperial
155   Male   18.9   19.1 Right  L on R    68   Right None Never 180.34 Imperial
156   Male   20.5   20.0 Right  R on L    76   Right Freq Never 173.00   Metric
157   Male   14.0   15.5 Right  L on R    NA Neither Freq Heavy     NA     <NA>
158 Female   18.9   19.2 Right  L on R    74   Right Some Never 167.64 Imperial
159   Male   20.0   20.5 Right  R on L    NA   Right None Never 187.96 Imperial
160   Male   18.5   19.0 Right  L on R    84   Right Freq Regul 187.00   Metric
161 Female   17.5   17.1 Right  R on L    80    Left None Never 167.00   Metric
162   Male   18.1   18.2  Left Neither    NA   Right Some Never 168.00   Metric
163   Male   20.2   20.3 Right  L on R    72 Neither Some Never 191.80 Imperial
164 Female   16.5   16.9 Right  R on L    60 Neither Freq Occas 169.20   Metric
165   Male   19.1   19.1 Right Neither    NA   Right Some Never 177.00   Metric
166 Female   17.6   17.2 Right  R on L    81    Left Some Never 168.00   Metric
167 Female   19.5   19.2 Right  R on L    70   Right Some Never 170.00   Metric
168 Female   16.5   15.0 Right  L on R    65   Right Some Regul 160.02 Imperial
169   Male   19.0   18.5 Right  L on R    NA Neither Freq Never 189.00   Metric
170   Male   19.0   18.5 Right  R on L    72   Right Freq Never 180.34 Imperial
171 Female   16.5   17.0 Right  L on R    NA   Right Some Never 168.00   Metric
172   Male   20.5   19.5  Left  L on R    80   Right Some Occas 182.88 Imperial
173 Female   15.5   15.5 Right Neither    50   Right Some Regul     NA     <NA>
174 Female   18.0   17.5 Right  R on L    48 Neither Freq Never 165.00   Metric
175 Female   17.5   18.0 Right  R on L    68 Neither Freq Never 157.48 Imperial
176 Female   19.0   18.5  Left  L on R   104    Left Freq Never 170.00   Metric
177   Male   20.5   20.5 Right Neither    76   Right Freq Regul 172.72 Imperial
178 Female   16.7   17.0 Right  L on R    84    Left Freq Never 164.00   Metric
179 Female   20.5   20.5 Right  R on L    NA    Left Freq Regul     NA     <NA>
180 Female   17.0   16.5 Right  R on L    70   Right Some Never 162.56 Imperial
181   Male   19.0   19.5 Right  R on L    68   Right Freq Occas 172.00   Metric
182 Female   14.0   13.5 Right  R on L    87 Neither Freq Occas 165.10 Imperial
183 Female   17.5   17.6 Right  L on R    79   Right Some Never 162.50   Metric
184   Male   18.5   19.0 Right  L on R    70    Left Freq Never 170.00   Metric
185   Male   18.0   18.5 Right Neither    90   Right Some Never 175.00   Metric
186   Male   20.5   20.7 Right  R on L    72   Right Some Never 168.00   Metric
187 Female   17.0   17.0 Right  L on R    79   Right Some Never 163.00   Metric
188   Male   18.5   18.5 Right  R on L    65   Right None Never 165.00   Metric
189   Male   18.0   18.5 Right  R on L    62   Right Freq Never 173.00   Metric
190   Male   18.5   18.0 Right Neither    63 Neither Freq Never 196.00   Metric
191   Male   20.0   19.5 Right  R on L    92   Right Some Never 179.10 Imperial
192   Male   22.0   22.5 Right  L on R    60   Right Some Never 180.00   Metric
193   Male   17.9   18.4 Right  R on L    68    Left None Occas 176.00   Metric
194 Female   17.6   17.8 Right  L on R    72    Left Some Never 160.02 Imperial
195 Female   16.7   15.1 Right Neither    NA   Right None Never 157.48 Imperial
196 Female   17.0   17.6 Right  L on R    76   Right Some Never 165.00   Metric
197 Female   15.0   13.0 Right  R on L    80 Neither Freq Never 170.18 Imperial
198   Male   16.0   15.5 Right Neither    71   Right Freq Never 154.94 Imperial
199 Female   19.1   19.0 Right  R on L    80   Right Some Occas 170.00   Metric
200 Female   17.5   16.5 Right  R on L    80 Neither Some Never 164.00   Metric
201 Female   16.2   15.8 Right  R on L    61   Right Some Occas 167.00   Metric
202   Male   21.0   21.0 Right  L on R    48 Neither Freq Never 174.00   Metric
203 Female   18.8   17.8 Right  R on L    76   Right Some Never     NA     <NA>
204 Female   18.5   18.0 Right Neither    86   Right None Never 160.00   Metric
205   Male   17.0   17.5 Right  R on L    80   Right Some Regul 179.10   Metric
206 Female   17.5   17.0 Right  R on L    83 Neither Freq Occas 168.00   Metric
207 Female   17.5   17.6 Right  L on R    76   Right Some Never 153.50   Metric
208   Male   17.5   17.6 Right  R on L    84   Right Some Never 160.00   Metric
209   Male   17.5   17.0  Left  L on R    97 Neither None Never 165.00   Metric
210 Female   20.8   20.7 Right  R on L    NA Neither Freq Never 171.50   Metric
211 Female   18.6   18.6 Right  L on R    74   Right Some Never 160.00   Metric
212 Female   17.5   17.5  Left  R on L    83 Neither Some Never 163.00   Metric
213   Male   18.0   18.5 Right  R on L    78   Right Freq Never     NA     <NA>
214   Male   17.0   17.5 Right  R on L    65   Right Some Never 165.00   Metric
215 Female   18.0   17.8 Right  L on R    68   Right Some Never 168.90 Imperial
216   Male   19.5   20.0 Right Neither    NA   Right Some Never 170.00   Metric
217 Female   16.3   16.2 Right  L on R    NA   Right None Never     NA     <NA>
218   Male   18.2   19.8 Right  R on L    88   Right Freq Never 185.00   Metric
219 Female   17.0   17.3 Right  L on R    NA Neither Freq Never 173.00   Metric
220   Male   23.2   23.2 Right  L on R    75   Right Freq Never 188.00   Metric
221   Male   23.2   23.3 Right  L on R    NA   Right None Heavy 171.00   Metric
222 Female   15.9   16.5 Right  R on L    70   Right Freq Never 167.64 Imperial
223 Female   17.5   18.4 Right  R on L    88   Right Some Never 162.56 Imperial
224 Female   17.5   17.6 Right  L on R    NA   Right Freq Never 150.00   Metric
225 Female   17.6   17.2 Right  L on R    NA   Right Some Never     NA     <NA>
226 Female   17.5   17.8 Right  R on L    96   Right Some Never     NA     <NA>
227 Female   18.8   18.3 Right  R on L    80   Right Some Heavy 170.18 Imperial
228   Male   20.0   19.8 Right  L on R    68   Right Freq Never 185.00   Metric
229 Female   18.6   18.8 Right  L on R    70   Right Freq Regul 167.00   Metric
230   Male   18.6   19.6 Right  L on R    71   Right Freq Occas 185.00   Metric
231 Female   18.8   18.5 Right  R on L    80   Right Some Never 169.00   Metric
232   Male   18.0   16.0 Right  R on L    NA   Right Some Never 180.34 Imperial
233 Female   18.0   18.0 Right  L on R    85   Right Some Never 165.10 Imperial
234 Female   18.5   18.0 Right  L on R    88   Right Some Never 160.00   Metric
235 Female   17.5   16.5 Right  R on L    NA   Right Some Never 170.00   Metric
236   Male   21.0   21.5 Right  R on L    90   Right Some Never 183.00   Metric
237 Female   17.6   17.3 Right  R on L    85   Right Freq Never 168.50   Metric
       Age
1   18.250
2   17.583
3   16.917
4   20.333
5   23.667
6   21.000
7   18.833
8   35.833
9   19.000
10  22.333
11  28.500
12  18.250
13  18.750
14  17.500
15  17.167
16  17.167
17  19.333
18  18.333
19  19.750
20  17.917
21  17.917
22  18.167
23  17.833
24  18.250
25  19.167
26  17.583
27  17.500
28  18.083
29  21.917
30  19.250
31  41.583
32  17.500
33  39.750
34  17.167
35  17.750
36  18.000
37  19.000
38  17.917
39  35.500
40  19.917
41  17.500
42  17.083
43  28.583
44  17.500
45  17.417
46  18.500
47  18.917
48  19.417
49  18.417
50  30.750
51  18.500
52  17.500
53  18.333
54  17.417
55  20.000
56  18.333
57  17.167
58  17.417
59  17.667
60  18.417
61  20.333
62  17.333
63  17.500
64  19.833
65  18.583
66  18.000
67  30.667
68  16.917
69  19.917
70  18.333
71  17.583
72  17.833
73  17.667
74  17.417
75  17.750
76  20.667
77  23.583
78  17.167
79  17.083
80  18.750
81  16.750
82  20.167
83  17.667
84  17.167
85  17.167
86  17.250
87  18.000
88  18.750
89  21.583
90  17.583
91  19.667
92  18.000
93  19.667
94  17.083
95  22.833
96  17.083
97  19.417
98  23.250
99  18.083
100 19.083
101 18.917
102 17.750
103 20.833
104 20.167
105 17.667
106 18.250
107 17.000
108 18.500
109 18.583
110 17.750
111 24.167
112 18.167
113 21.167
114 17.917
115 17.417
116 20.500
117 22.917
118 18.917
119 18.917
120 20.083
121 17.500
122 18.250
123 17.500
124 17.417
125 21.000
126 19.833
127 17.667
128 18.083
129 18.000
130 18.333
131 20.000
132 18.750
133 19.083
134 18.500
135 18.417
136 19.167
137 21.500
138 19.333
139 21.417
140 18.667
141 17.500
142 21.083
143 17.250
144 19.000
145 19.167
146 19.000
147 23.000
148 32.667
149 20.000
150 20.167
151 25.500
152 18.167
153 23.500
154 70.417
155 43.833
156 23.583
157 21.083
158 44.250
159 19.667
160 17.917
161 18.417
162 21.167
163 17.500
164 29.083
165 19.917
166 18.500
167 18.167
168 32.750
169 17.417
170 17.333
171 73.000
172 18.667
173 18.500
174 18.667
175 17.750
176 17.250
177 36.583
178 23.083
179 19.250
180 17.167
181 23.417
182 17.083
183 17.250
184 23.833
185 18.750
186 21.167
187 24.667
188 18.500
189 20.333
190 20.083
191 18.917
192 27.333
193 18.917
194 17.250
195 18.167
196 26.500
197 17.000
198 17.167
199 19.167
200 17.500
201 19.250
202 21.333
203 18.583
204 20.167
205 18.667
206 17.083
207 17.417
208 18.583
209 19.500
210 18.500
211 17.167
212 17.250
213 17.500
214 20.417
215 17.083
216 21.250
217 19.250
218 19.333
219 19.167
220 18.917
221 20.917
222 17.333
223 18.167
224 20.750
225 19.917
226 18.667
227 18.417
228 17.417
229 20.333
230 19.333
231 18.167
232 20.750
233 17.667
234 16.917
235 18.583
236 17.167
237 17.750
survey <- as_tibble(survey)

# check the structure of the data

str(survey) 
tibble [237 × 12] (S3: tbl_df/tbl/data.frame)
 $ Sex   : Factor w/ 2 levels "Female","Male": 1 2 2 2 2 1 2 1 2 2 ...
 $ Wr.Hnd: num [1:237] 18.5 19.5 18 18.8 20 18 17.7 17 20 18.5 ...
 $ NW.Hnd: num [1:237] 18 20.5 13.3 18.9 20 17.7 17.7 17.3 19.5 18.5 ...
 $ W.Hnd : Factor w/ 2 levels "Left","Right": 2 1 2 2 2 2 2 2 2 2 ...
 $ Fold  : Factor w/ 3 levels "L on R","Neither",..: 3 3 1 3 2 1 1 3 3 3 ...
 $ Pulse : int [1:237] 92 104 87 NA 35 64 83 74 72 90 ...
 $ Clap  : Factor w/ 3 levels "Left","Neither",..: 1 1 2 2 3 3 3 3 3 3 ...
 $ Exer  : Factor w/ 3 levels "Freq","None",..: 3 2 2 2 3 3 1 1 3 3 ...
 $ Smoke : Factor w/ 4 levels "Heavy","Never",..: 2 4 3 2 2 2 2 2 2 2 ...
 $ Height: num [1:237] 173 178 NA 160 165 ...
 $ M.I   : Factor w/ 2 levels "Imperial","Metric": 2 1 NA 2 2 1 1 2 2 2 ...
 $ Age   : num [1:237] 18.2 17.6 16.9 20.3 23.7 ...
pairs(survey)

# subset the data

survey %>% 
  select(Wr.Hnd, NW.Hnd, Pulse, Height, Age) -> df1 
df1 
# A tibble: 237 × 5
   Wr.Hnd NW.Hnd Pulse Height   Age
    <dbl>  <dbl> <int>  <dbl> <dbl>
 1   18.5   18      92   173   18.2
 2   19.5   20.5   104   178.  17.6
 3   18     13.3    87    NA   16.9
 4   18.8   18.9    NA   160   20.3
 5   20     20      35   165   23.7
 6   18     17.7    64   173.  21  
 7   17.7   17.7    83   183.  18.8
 8   17     17.3    74   157   35.8
 9   20     19.5    72   175   19  
10   18.5   18.5    90   167   22.3
# ℹ 227 more rows
pairs(df1)

# build our model with one indicator

mlm1 <- lm(cbind(df1$Height, df1$Pulse) ~ df1$Age) 

mlm1 <- lm(cbind(Height, Pulse) ~ Age, data = df1) 

summary(mlm1)
Response Height :

Call:
lm(formula = Height ~ Age, data = df1)

Residuals:
    Min      1Q  Median      3Q     Max 
-20.608  -7.528  -1.583   7.379  27.399 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 173.39245    2.66936  64.957   <2e-16 ***
Age          -0.04278    0.12503  -0.342    0.733    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 9.916 on 169 degrees of freedom
  (66 observations deleted due to missingness)
Multiple R-squared:  0.0006923, Adjusted R-squared:  -0.005221 
F-statistic: 0.1171 on 1 and 169 DF,  p-value: 0.7327


Response Pulse :

Call:
lm(formula = Pulse ~ Age, data = df1)

Residuals:
    Min      1Q  Median      3Q     Max 
-38.130  -7.160  -0.721   6.360  29.381 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  78.9229     3.0761  25.657   <2e-16 ***
Age          -0.2448     0.1441  -1.699   0.0912 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.43 on 169 degrees of freedom
  (66 observations deleted due to missingness)
Multiple R-squared:  0.01679,   Adjusted R-squared:  0.01097 
F-statistic: 2.886 on 1 and 169 DF,  p-value: 0.09118
# build our model with more than one indicator

mlm2 <- lm(cbind(Height, Pulse) ~ Age + Wr.Hnd + NW.Hnd, data = df1) 

summary(mlm2)
Response Height :

Call:
lm(formula = Height ~ Age + Wr.Hnd + NW.Hnd, data = df1)

Residuals:
     Min       1Q   Median       3Q      Max 
-20.5028  -4.9668  -0.9197   4.3439  25.6729 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 115.8383     5.9953  19.322   <2e-16 ***
Age          -0.1341     0.1004  -1.337   0.1832    
Wr.Hnd        2.7889     1.2009   2.322   0.0214 *  
NW.Hnd        0.3776     1.1727   0.322   0.7478    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.823 on 166 degrees of freedom
  (67 observations deleted due to missingness)
Multiple R-squared:  0.389, Adjusted R-squared:  0.378 
F-statistic: 35.23 on 3 and 166 DF,  p-value: < 2.2e-16


Response Pulse :

Call:
lm(formula = Pulse ~ Age + Wr.Hnd + NW.Hnd, data = df1)

Residuals:
    Min      1Q  Median      3Q     Max 
-38.244  -6.902  -0.928   6.238  29.893 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  78.3243     8.8045   8.896 9.54e-16 ***
Age          -0.2242     0.1474  -1.521    0.130    
Wr.Hnd        0.5060     1.7636   0.287    0.775    
NW.Hnd       -0.4947     1.7221  -0.287    0.774    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 11.49 on 166 degrees of freedom
  (67 observations deleted due to missingness)
Multiple R-squared:  0.01518,   Adjusted R-squared:  -0.002614 
F-statistic: 0.8531 on 3 and 166 DF,  p-value: 0.4668
head(resid(mlm1)) # residuals 
       Height      Pulse
1   0.3883076  17.544343
2   5.1597724  29.381073
5  -7.3799460 -38.129668
6   0.2259562  -9.782504
7  10.2932491   8.687051
8 -14.8594685   3.848360
head(fitted(mlm1)) # estimates fitted for the model
    Height    Pulse
1 172.6117 74.45566
2 172.6402 74.61893
5 172.3799 73.12967
6 172.4940 73.78250
7 172.5868 74.31295
8 171.8595 70.15164
head(resid(mlm2)) # residuals 
      Height      Pulse
1   1.217489  17.311649
2   2.195023  29.893015
5 -10.994527 -38.243539
6   2.814107  -9.967349
7  13.520106   8.698683
8  -7.976292   3.665699
head(fitted(mlm2)) # estimates fitted for the model
    Height    Pulse
1 171.7825 74.68835
2 175.6050 74.10699
5 175.9945 73.24354
6 169.9059 73.96735
7 169.3599 74.30132
8 164.9763 70.33430
# gather coefficients

coef(mlm2)
                 Height      Pulse
(Intercept) 115.8382861 78.3242718
Age          -0.1341361 -0.2241610
Wr.Hnd        2.7889054  0.5059615
NW.Hnd        0.3776366 -0.4947372
# variance-covariance matrix

vcov(mlm2)
                   Height:(Intercept)   Height:Age Height:Wr.Hnd Height:NW.Hnd
Height:(Intercept)        35.94337245 -0.154686949  -1.874139393   0.147649352
Height:Age                -0.15468695  0.010072486   0.012318156  -0.015095498
Height:Wr.Hnd             -1.87413939  0.012318156   1.442122756  -1.361112632
Height:NW.Hnd              0.14764935 -0.015095498  -1.361112632   1.375140647
Pulse:(Intercept)         -6.06537128  0.026103109   0.316257226  -0.024915529
Pulse:Age                  0.02610311 -0.001699712  -0.002078664   0.002547335
Pulse:Wr.Hnd               0.31625723 -0.002078664  -0.243355293   0.229684999
Pulse:NW.Hnd              -0.02491553  0.002547335   0.229684999  -0.232052198
                   Pulse:(Intercept)    Pulse:Age Pulse:Wr.Hnd Pulse:NW.Hnd
Height:(Intercept)       -6.06537128  0.026103109  0.316257226 -0.024915529
Height:Age                0.02610311 -0.001699712 -0.002078664  0.002547335
Height:Wr.Hnd             0.31625723 -0.002078664 -0.243355293  0.229684999
Height:NW.Hnd            -0.02491553  0.002547335  0.229684999 -0.232052198
Pulse:(Intercept)        77.51891942 -0.333612690 -4.041948507  0.318434733
Pulse:Age                -0.33361269  0.021723288  0.026566515 -0.032556397
Pulse:Wr.Hnd             -4.04194851  0.026566515  3.110220051 -2.935505860
Pulse:NW.Hnd              0.31843473 -0.032556397 -2.935505860  2.965760021

Next up: Week 15